attention map
Regularizing Attention Scores with Bootstrapping
Chung, Neo Christopher, Laletin, Maxim
Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical images, the proposed \emph{Attention Regularization} approach demonstrates a straightforward removal of spurious attention arising from noise, drastically improving shrinkage and sparsity. Quantitative evaluations are conducted using both simulation and real-world datasets. Our study highlights bootstrapping as a practical regularization tool when using attention scores as explanations for ViT. Code available: https://github.com/ncchung/AttentionRegularization
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
- Health & Medicine > Therapeutic Area (0.46)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Leisure & Entertainment > Games > Computer Games (0.96)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
- Asia > Middle East > Israel (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
- North America > United States (0.04)
- Europe > Switzerland (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Hubei Province (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Sensing and Signal Processing > Image Processing (0.67)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Information Technology (0.69)
- Health & Medicine > Diagnostic Medicine (0.46)
Energy Consumption Analysis Details
The spike firing rate is defined as the proportion of non-zero elements in the spike tensor. In Table S1, we present the spike firing rates for all spiking tensors in spike-driven Transformer-8-512. SNNs are theoretically more energy efficient than counterpart ANNs. We employ two types of datasets: static image classification and neuromorphic classification. ImageNet-1K is the most typical static image dataset, which is widely used in the field of image classification.